Convergence Analysis of Gradient Descent Algorithms with Proportional Updates

نویسندگان

  • Igor Gitman
  • Deepak Dilipkumar
  • Ben Parr
چکیده

The rise of deep learning in recent years has brought with it increasingly clever optimization methods to deal with complex, non-linear loss functions [13]. These methods are often designed with convex optimization in mind, but have been shown to work well in practice even for the highly non-convex optimization associated with neural networks. However, one significant drawback of these methods when they are applied to deep learning is that the magnitude of the update step is sometimes disproportionate to the magnitude of the weights (much smaller or larger), leading to training instabilities such as vanishing and exploding gradients [12]. An idea to combat this issue is gradient descent with proportional updates.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Residual norm steepest descent based iterative algorithms for Sylvester tensor equations

Consider the following consistent Sylvester tensor equation[mathscr{X}times_1 A +mathscr{X}times_2 B+mathscr{X}times_3 C=mathscr{D},]where the matrices $A,B, C$ and the tensor $mathscr{D}$ are given and $mathscr{X}$ is the unknown tensor. The current paper concerns with examining a simple and neat framework for accelerating the speed of convergence of the gradient-based iterative algorithm and ...

متن کامل

Asymptotic and finite-sample properties of estimators based on stochastic gradients∗

Stochastic gradient descent procedures have gained popularity for parameter estimation from large data sets. However, their statistical properties are not well understood, in theory. And in practice, avoiding numerical instability requires careful tuning of key parameters. Here, we introduce implicit stochastic gradient descent procedures, which involve parameter updates that are implicitly def...

متن کامل

Stochastic gradient descent algorithms for strongly convex functions at O(1/T) convergence rates

With a weighting scheme proportional to t, a traditional stochastic gradient descent (SGD) algorithm achieves a high probability convergence rate of O(κ/T ) for strongly convex functions, instead of O(κ ln(T )/T ). We also prove that an accelerated SGD algorithm also achieves a rate of O(κ/T ).

متن کامل

Random Multi-Constraint Projection: Stochastic Gradient Methods for Convex Optimization with Many Constraints

Consider convex optimization problems subject to a large number of constraints. We focus on stochastic problems in which the objective takes the form of expected values and the feasible set is the intersection of a large number of convex sets. We propose a class of algorithms that perform both stochastic gradient descent and random feasibility updates simultaneously. At every iteration, the alg...

متن کامل

Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning

In this paper, we consider the minimization of a convex objective function defined on a Hilbert space, which is only available through unbiased estimates of its gradients. This problem includes standard machine learning algorithms such as kernel logistic regression and least-squares regression, and is commonly referred to as a stochastic approximation problem in the operations research communit...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1801.03137  شماره 

صفحات  -

تاریخ انتشار 2018